AITopics | conditional computation

48237d9f2dea8c74c2a72126cf63d933-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 17:30:55 GMT

arxiv preprint arxiv, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Neural Information Processing SystemsApr-25-2026, 11:01:34 GMT

We propose Conditional Adapter (CODA), a parameter-efficient transfer learning method that also improves inference efficiency. CODA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CODA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CODA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CODA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.

artificial intelligence, coda, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Spiking Transformer with Experts Mixture

Neural Information Processing SystemsMar-18-2026, 07:18:13 GMT

Spiking Neural Networks (SNNs) provide a sparse spike-driven mechanism which is believed to be critical for energy-efficient deep learning. Mixture-of-Experts (MoE), on the other side, aligns with the brain mechanism of distributed and sparse processing, resulting in an efficient way of enhancing model capacity and conditional computation. In this work, we consider how to incorporate SNNs' spike-driven and MoE's conditional computation into a unified framework. However, MoE uses softmax to get the dense conditional weights for each expert and TopK to hard-sparsify the network, which does not fit the properties of SNNs. To address this issue, we reformulate MoE in SNNs and introduce the Spiking Experts Mixture Mechanism (SEMM) from the perspective of sparse spiking activation. Both the experts and the router output spiking sequences, and their element-wise operation makes SEMM computation spike-driven and dynamic sparse-conditional. By developing SEMM into Spiking Transformer, the Experts Mixture Spiking Attention (EMSA) and the Experts Mixture Spiking Perceptron (EMSP) are proposed, which performs routing allocation for head-wise and channel-wise spiking experts, respectively. Experiments show that SEMM realizes sparse conditional computation and obtains a stable improvement on neuromorphic and static datasets with approximate computational overhead based on the Spiking Transformer baselines.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

137101016144540ed3191dc2b02f09a5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 00:50:54 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.71)
(2 more...)

Add feedback

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Neural Information Processing SystemsFeb-8-2026, 11:07:02 GMT

In such cases, the pretrained Transformer block can be skipped.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

Wieser, Frederico, Benfeghoul, Martin, Ammar, Haitham Bou, Wang, Jun, Fountas, Zafeirios

arXiv.org Artificial IntelligenceNov-27-2025

The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), two distinct architectures that leverage Bayesian surprise signals to dynamically route computation, learning where and when to compute within decoder-only TFs. SDT augments a decoder-only stack with alternating Decision and Dynamic layers: a Decision layer computes a full block 'posterior' and a lightweight 'prior,' while a Dynamic layer employs fixed-capacity Top-K routing based on Bayesian surprise (Expected and Unexpected Change), maintaining a static compute graph. STT extends this conditional computation to the temporal domain: a transition network predicts residual updates, forming a temporal 'change hypothesis' that informs a router to dynamically execute or bypass TF blocks for each token, managing KV-cache contributions. Both architectures exhibit the predicted shift from novelty to prediction driven gating over training, suggesting alignment with surprise based principles. While operating at reduced capacity, they offer preliminary insights into the compute-accuracy trade-offs of conditional computation. The proposed architectures establish a flexible framework for efficiency, reducing self-attention computation by 75% and KV-cache requirements by 50% within each compute skipping layer, setting a pathway for more efficient models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.21408

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

137101016144540ed3191dc2b02f09a5-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 18:59:48 GMT

computation, semm, spiking transformer, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.71)
(2 more...)

Add feedback

Spiking Transformer with Experts Mixture

Neural Information Processing SystemsMay-26-2025, 16:44:03 GMT

Spiking Neural Networks (SNNs) provide a sparse spike-driven mechanism which is believed to be critical for energy-efficient deep learning. Mixture-of-Experts (MoE), on the other side, aligns with the brain mechanism of distributed and sparse processing, resulting in an efficient way of enhancing model capacity and conditional computation. In this work, we consider how to incorporate SNNs' spike-driven and MoE's conditional computation into a unified framework. However, MoE uses softmax to get the dense conditional weights for each expert and TopK to hard-sparsify the network, which does not fit the properties of SNNs. To address this issue, we reformulate MoE in SNNs and introduce the Spiking Experts Mixture Mechanism (SEMM) from the perspective of sparse spiking activation. Both the experts and the router output spiking sequences, and their element-wise operation makes SEMM computation spike-driven and dynamic sparse-conditional.

artificial intelligence, machine learning, spiking transformer, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.80)

Add feedback

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Shan, Weiqiao, Zhang, Yuhao, Han, Yuchen, Li, Bei, Zhao, Xiaofeng, Li, Yuang, Zhang, Min, Yang, Hao, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJan-14-2025

Recent advancements have highlighted the efficacy of self-supervised learning (SSL) features in various speech-related tasks, providing lightweight and versatile multi-view speech representations. However, our study reveals that while SSL features expedite model convergence, they conflict with traditional spectral features like FBanks in terms of update directions. In response, we propose a novel generalized feature fusion framework grounded in conditional computation, featuring a gradient-sensitive gating network and a multi-stage dropout strategy. This framework mitigates feature conflicts and bolsters model robustness to multi-view input features. By integrating SSL and spectral features, our approach accelerates convergence and maintains performance on par with spectral models across multiple speech translation tasks on the MUSTC dataset.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.08057

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

Reviews: Modular Networks: Learning to Decompose Neural Computation

Neural Information Processing SystemsOct-7-2024, 07:50:53 GMT

The paper is concerned with conditional computation, which is an interesting topic yet at early stages of research, and as such one that requires much research and investigation. The paper proposes a latent-variable approach to constructing modular networks, modeling the choice of processing modules in a layer as a discrete latent variable. A modular network is composed of L modular layers, each comprised of M modules and a controller. Each module is a function (standard layer) f_i(x; \theta_i). The controller accepts the input, chooses K of the M modules to process the input, and outputs the as the module output. Modular layers can be stacked, or placed anywhere inside a standard network.

computation, decompose neural computation, module, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Filters

Collaborating Authors

conditional computation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

48237d9f2dea8c74c2a72126cf63d933-Paper.pdf

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Spiking Transformer with Experts Mixture

137101016144540ed3191dc2b02f09a5-Paper-Conference.pdf

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

137101016144540ed3191dc2b02f09a5-Paper-Conference.pdf

Spiking Transformer with Experts Mixture

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Reviews: Modular Networks: Learning to Decompose Neural Computation